Acquisition of English-Chinese Transliterated Word Pairs from Parallel-Aligned Texts using a Statistical Machine Transliteration Model

نویسندگان

  • Chun-Jen Lee
  • Jason S. Chang
چکیده

This paper presents a framework for extracting English and Chinese transliterated word pairs from parallel texts. The approach is based on the statistical machine transliteration model to exploit the phonetic similarities between English words and corresponding Chinese transliterations. For a given proper noun in English, the proposed method extracts the corresponding transliterated word from the aligned text in Chinese. Under the proposed approach, the parameters of the model are automatically learned from a bilingual proper name list. Experimental results show that the average rates of word and character precision are 86.0% and 94.4%, respectively. The rates can be further improved with the addition of simple linguistic processing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

English-Chinese Transliteration Word Pair Extraction from Parallel Corpora

Bilingual dictionary construction is a time-consuming job; therefore many studies have recently focused on automatically constructing bilingual dictionaries from bilingual texts. In this paper, we propose two novel approaches called dynamic window and tokenizer based on statistical machine transliteration model to efficiently extract English-Chinese transliteration pairs from parallel corpora. ...

متن کامل

Backward Machine Transliteration by Learning Phonetic Similarity

In many cross-lingual applications we need to convert a transliterated word into its original word. In this paper, we present a similarity-based framework to model the task of backward transliteration, and provide a learning algorithm to automatically acquire phonetic similarities from a corpus. The learning algorithm is based on Widrow-Hoff rulewithsomemodifications. The experiment results sho...

متن کامل

Chinese-to-English Backward Machine Transliteration

It is challenging to transliterate named entities across languages. It is even more challenging to backward transliterate the transliterated term into its original form. This paper addresses the problem of backward translating person name from Chinese to its English counterpart. We propose a statistical backward transliteration method. Our method uses English sub-syllable and Chinese syllable a...

متن کامل

Automatic Extraction of English-Chinese Transliteration Pairs using Dynamic Window and Tokenizer

Recently, many studies have been focused on extracting transliteration pairs from bilingual texts. Most of these studies are based on the statistical transliteration model. The paper discusses the limitations of previous approaches and proposes novel approaches called dynamic window and tokenizer to overcome these limitations. Experimental results show that the average rates of word and charact...

متن کامل

Mining Hindi-English Transliteration Pairs from Online Hindi Lyrics

This paper describes a method to mine Hindi-English transliteration pairs from online Hindi song lyrics. The technique is based on the observations that lyrics are transliterated word-by-word, maintaining the precise word order. The mining task is nevertheless challenging because the Hindi lyrics and its transliterations are usually available from different, often unrelated, websites. Therefore...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003